Corpus Design for Malay Corpus-based Speech Synthesis System

نویسنده

  • Tian-Swee Tan
چکیده

Problem statement: Speech corpus is one of the major components in corpus-based synthesis. The quality and coverage in speech corpus will affect the quality of synthesis speech sound. Approach: This study proposes a corpus design for Malay corpus-based speech synthesis system. This includes the study of design criteria in corpus-based speech synthesis, Malay corpus based database design and the concatenation engine in Malay corpus-based synthesis system. A set of 10 millions digital text corpuses for Malay language has been collected from Malay internet news. This text corpus had been analyzed using word frequency count to find out all high frequency words to be used for designing the sentences for speech corpus. Results: Altogether 381 sentences for speech corpus had been designed using 70% of high frequency words from 10 million text corpus. It consists of 16826 phoneme units and the total storage size is 37.6Mb. All the phone units are phonetically transcribed to preserve the phonetic context of its origin that will be used for phonetic context unit. This speech corpus had been labeled at phoneme level and used for variable length continuous phoneme based concatenation. Speech corpus is one of the major components in corpus-based synthesis. The quality and coverage in speech corpus will affect the quality of synthesized speech sound. Conclusion/Recommendation: This study has proposed a platform for designing speech corpus especially for Malay Text to Speech which can be further enhanced to support more coverage and higher naturalness of synthetic speech.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Preparation of MaDiTS corpus for Malay dialect translation and speech synthesis system

This paper presents our work in acquiring a Malay dialect translation and speech synthesis corpus. In this study, an architecture of speech corpus acquisition, which including Malay dialect translation and Malay dialect grapheme to phoneme (G2P), was proposed. The pronunciation dictionary for dialectal Malay was generated through G2P tool. As dialectal Malay is considered as scarce resource, di...

متن کامل

Statistical Parametric Evaluation on New Corpus Design for Malay Speech Articulation Disorder Early Diagnosis

Corresponding Author: Tan Tian Swee Medical Implant Technology Group (MediTEG), Cardiovascular Engineering Center, Material Manufacturing Research Alliance (MMRA), Faculty of Biosciences and Medical Engineering, Universiti Teknologi Malaysia, Malaysia Email: [email protected] Abstract: Speech-to-Text or always been known as speech recognition plays an important role nowadays especially...

متن کامل

A cross-cultural study of request speech act: Iraqi and Malay students

Several  studies  have  indicated  that  the  range  and  linguistics  expressions  of  external modifiers  available  in  one  language  differ  from  those  available  in  another  language.  The present study aims to investigate the cross-cultural differences and similarities with regards to  the  realization  of  request  external  modifications.  To  this  end,  30  Iraqi  and  30  Malay u...

متن کامل

Restricted Domain Malay Speech Synthesizer Using Syntax-Prosody Representation

The speech synthesis approach required in restricted domain speech application is a synthesizer that has high quality like the speech output of ‘slot-filler’ approach but have at least the least flexibility of the ‘genuine’ speech synthesizer. Thus, in this research study, we propose an alternative approach of creating a speech synthesizer to be used in a restricted domain speech application. I...

متن کامل

Development of HMM-based Malay Text-to-Speech System

This paper presents the development of a hidden Markov model (HMM)-based Malay text-to-speech (TTS) system. To our knowledge, this is the first report on the development of the HMM-based speech synthesis system for the Malay language. In this paper, We first discuss the Malay speech characteristics, specifically, on Malay phonological system and syllable structure. In the Malay phonological sys...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009